81 research outputs found

    Combination approaches for multilingual text retrieval

    Get PDF

    Syntax-based skill extractor for job advertisements

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    How effective is stemming and decompounding for German text retrieval?

    Get PDF
    Erworben im Rahmen der Schweizer Nationallizenzen (http://www.nationallizenzen.ch

    Lessons learned from challenging data science case studies

    Get PDF
    In this chapter, we revisit the conclusions and lessons learned of the chapters presented in Part II of this book and analyze them systematically. The goal of the chapter is threefold: firstly, it serves as a directory to the individual chapters, allowing readers to identify which chapters to focus on when they are interested either in a certain stage of the knowledge discovery process or in a certain data science method or application area. Secondly, the chapter serves as a digested, systematic summary of data science lessons that are relevant for data science practitioners. And lastly, we reflect on the perceptions of a broader public towards the methods and tools that we covered in this book and dare to give an outlook towards the future developments that will be influenced by them

    Database search vs. information retrieval : a novel method for studying natural language querying of semi-structured data

    Get PDF
    The traditional approach of querying a relational database is via a formal language, namely SQL. Recent developments in the design of natural language interfaces to databases show promising results for querying either with keywords or with full natural language queries and thus render relational databases more accessible to non-tech savvy users. Such enhanced relational databases basically use a search paradigm which is commonly used in the field of information retrieval. However, the way systems are evaluated in the database and the information retrieval communities often differs due to a lack of common benchmarks. In this paper, we provide an adapted benchmark data set that is based on a test collection originally used to evaluate information retrieval systems. The data set contains 45 information needs developed on the Internet Movie Database (IMDb), including corresponding relevance assessments. By mapping this benchmark data set to a relational database schema, we enable a novel way of directly comparing database search techniques with information retrieval. To demonstrate the feasibility of our approach, we present an experimental evaluation that compares SODA, a keyword-enabled relational database system, against the Terrier information retrieval system and thus lays the foundation for a future discussion of evaluating database systems that support natural language interfaces

    LILLIE : information extraction and database integration using linguistics and learning-based algorithms

    Get PDF
    Querying both structured and unstructured data via a single common query interface such as SQL or natural language has been a long standing research goal. Moreover, as methods for extracting information from unstructured data become ever more powerful, the desire to integrate the output of such extraction processes with ``clean'', structured data grows. We are convinced that for successful integration into databases, such extracted information in the form of ``triples'' needs to be both 1) of high quality and 2) have the necessary generality to link up with varying forms of structured data. It is the combination of both these aspects, which heretofore have been usually treated in isolation, where our approach breaks new ground. The cornerstone of our work is a novel, generic method for extracting open information triples from unstructured text, using a combination of linguistics and learning-based extraction methods, thus uniquely balancing both precision and recall. Our system called LILLIE (LInked Linguistics and Learning-Based Information Extractor) uses dependency tree modification rules to refine triples from a high-recall learning-based engine, and combines them with syntactic triples from a high-precision engine to increase effectiveness. In addition, our system features several augmentations, which modify the generality and the degree of granularity of the output triples. Even though our focus is on addressing both quality and generality simultaneously, our new method substantially outperforms current state-of-the-art systems on the two widely-used CaRB and Re-OIE16 benchmark sets for information extraction

    Workshop on Novel Methodologies for Evaluation in Information

    Get PDF
    Abstract Information retrieval is an empirical science; the field cannot move forward unless there are means of evaluating the innovations devised by researchers. However the methodologies conceived in the early years of IR and used in the campaigns of today are starting to show their age and new research is emerging to understand how to overcome the twin challenges of scale and diversity. With such challenges in mind it was decided to hold the first Workshop on Novel Methodologies for Evaluation in Information Retrieval. The workshop was composed of two invited talks as well as long and short papers covering a range of important evaluation methods and tools. The workshop was chaired by Mark Sanderson; with co-organization from Julio Gonzalo, Nicola Ferro and Martin Braschler. Invited talks The invited talks were from Tetsuya Sakai (NewsWatch) and Martin Braschler (Zurich University of Applied Science). In both talks, the speakers described approaches to evaluation that did not involve the traditional use of test collections. Tetsuya spoke on his experience evaluating search engines working at NewsWatch. The extensive use of query logs was a key part of his talk. Sakai showed the way in which use of such logs allows examination of more complex search behaviors beyond the initial search covered by test collections. In the same vein, Martin Braschler detailed a study of the search facilities on a large number of enterprise web sites. Like Sakai, Braschler choose to look beyond traditional approaches of evaluation by not just examining precision and recall, but other factors such as speed of response and coverage of the search engine of structured data sources held by the enterprise. Refereed papers Eleven short and long papers were presented at the workshop. The papers are grouped under common themes

    The role of data scientists in modern enterprises : experience from data science education

    Get PDF
    "Data Scientist" has often been considered as the sexiest job of the 21st century. As a consequence, the spectrum of data science education programs has increased significantly in recent years, and there is a high demand for data scientists at many companies. However, what training is required to become a data scientist? What is the role of data scientists in current enterprises? Is the training well-aligned to the practical needs of a job? In this article, we will address these questions by evaluating a survey of people who were trained in a continuing education program in data science in Switzerland. Our study sheds lights on the practical aspects of the data science education and how this newly-gained knowledge can successfully be applied in an enterprise. One of the highlights from the point of view of the database community is the important role of SQL in data science
    corecore